Journal of Proteome Research
● American Chemical Society (ACS)
Preprints posted in the last 30 days, ranked by how well they match Journal of Proteome Research's content profile, based on 215 papers previously published here. The average preprint has a 0.14% match score for this journal, so anything above that is already an above-average fit.
Ambrose, E. A.; Kandasamy, G.; Meulener, M. M.; Zhang, F.
Show abstract
Many proteomics protocols rely on enzymatic digestion of complex protein mixtures to generate peptides with predictable cleavage patterns for the mass spectrometry analysis. One of the most utilized enzymes, trypsin, is classically defined as a serine endopeptidase with high specificity for cleaving peptide bonds on the C-terminal side of internal lysine and arginine residues. Accordingly, trypsin is not expected to remove the N-terminal arginine, which may arise through posttranslational modification such as arginylation or by proteolysis exposing internal residues as the new N-termini. N-terminal arginine plays important biological roles, including functioning as an N-degron and modulating protein interactions/signaling through its positive charge. Curiously, prior mass spectrometry-based studies utilizing trypsin to identify proteins bearing N-terminal arginine have frequently reported low and inconsistent yields, suggesting potential systematic bias in current proteomic approaches. Here, we explored whether trypsin would affect the integrity of the N-terminal arginine. By using antibodies specifically recognizing N-terminal arginine of different peptides, and by using mass spectrometry peptide analysis, we show that trypsin can remove N-terminal arginine residues in an exopeptidase-like manner. This effect occurs across a range of digestion conditions consistent with standard proteomic workflows, on peptides or whole proteins, and depends on trypsin concentration, incubation time, and catalytic activity. In addition, we show that the alternative arginine-cleavage enzyme Arg-C can also affect N-terminal arginine in a sequence-dependent context. In contrast, Lys-C and LysargiNase do not exhibit such effects, providing suitable alternative digestion strategies. Together, these findings reveal an unappreciated enzymatic behavior of arginine-cleaving proteases and suggest that their widespread use may systematically compromise the detection of N-terminal arginine in proteomic studies.
Paradeisi, F.; Gonidaki, C.; Tserga, A.; Courraud, J.; Bakouros, P.; Karousi, P.; Kostopoulos, I. V.; Margelos, T.; Goula, E.; Stegehuis, C.; Meylahn, J. M.; Martzakli, A.; Liacos, C. I.; Dimopoulos, M. A.; Tsitsilonis, O.; Vlahou, A.; Zoidakis, J.; Kastritis, E.
Show abstract
Background: Multiple myeloma (MM) remains incurable despite therapeutic advances, reflecting limited understanding of the molecular mechanisms underlying disease initiation and progression. MM develops through asymptomatic precursor stages, monoclonal gammopathy of undetermined significance (MGUS) and smouldering multiple myeloma (SMM). This study aimed to investigate protein changes associated with disease progression and, through a further integrative approach, to highlight molecular changes of potential predictive and/or therapeutic value. Methods: We performed a comparative proteomic analysis of 94 bone marrow-derived CD138+-selected plasma cell samples (29 MGUS, 20 SMM, and 45 MM) using LC-MS/MS. Differential protein abundance was assessed using pairwise Mann-Whitney U tests between groups, with Benjamini-Hochberg correction. Pathway enrichment, protein-protein interaction, and co-expression network analyses were also conducted. Selected proteins were further evaluated using public transcriptomic datasets and experimentally validated in independent samples by flow cytometry and enzyme-linked immunosorbent assay (ELISA). Results: Following data processing, proteomic analysis identified 6,203 proteins. Pairwise comparisons revealed significant proteomic differences across disease stages, with 370 differentially abundant proteins exhibiting monotonic changes during disease progression. Pathway analysis showed that monotonically upregulated proteins were mainly associated with gene expression and cell proliferation, whereas downregulated proteins were linked to immune-related processes. Further co-expression network analysis, combined with criteria including detection frequency, biological relevance, and translational potential, highlighted a group of prioritised proteins. Representative examples include nucleolin (NCL) and U3 small nucleolar ribonucleoprotein IMP3 (IMP3), involved in nucleolar organisation, ribosome biogenesis and rRNA processing, as well as the immune-associated lactotransferrin (LTF) and serine protease cathepsin G (CTSG). Transcriptomic support and independent experimental validation by flow cytometry and ELISA confirmed the relevance of selected candidates. Conclusions: Taken together, our findings highlight coordinated changes in immune regulation, RNA processing and ribosome biogenesis during MM progression and identify candidate proteins and their networks, including the emerging pharmacologically tractable target NCL and the underexplored IMP3 of potential therapeutic relevance, opening new avenues for further investigation.
Valdes-Tresanco, M. E.; Wacker, S.; Valdes-Tresanco, M. S.; Plakhotnyk, A.; Brodie, N. I.; Hepburn, M.; Ulke-Lemee, A.; Huttlin, E. L.; Lewis, I. A.
Show abstract
Over the past years, proteomics has moved increasingly towards the analysis of large cohorts of biological specimens. This has been made possible by significant improvements in mass spectrometry technology, chromatographic separation methods, and improved data acquisition strategies. These technological advances now routinely enable experiments that yield vast datasets that substantially outstrip the capacity of existing proteomics data analysis approaches. Processing such large datasets requires purpose-built, quality control tools designed to organize and analyze the data while recording all processing parameters for reproducibility. To address this need, we developed an open-source, Python-based software platform, Large-scale Automated Multi-level Proteomics Evaluation by Python (LAMPrEY), a comprehensive quality-control pipeline for quantitative proteomics analyses of large cohorts of samples. LAMPrEY features GUI-based file submission, automated processing with MaxQuant and RawTools, an interactive analytics dashboard, and an application programming interface (API) for programmatic usage that collectively enable rapid, reproducible analysis and interpretation of proteomics data. We demonstrate the longitudinal monitoring and analytical capabilities of LAMPrEY using TMT11 quantitative proteomics data generated from 910 Enterococcus faecium isolates collected from bloodstream infection patients. LAMPrEY is an open-source software that can be accessed at www.lewisresearchgroup.org/software.
Cain, S. A.; Fatima, M.; Humphries, M.
Show abstract
Manchester Proteome Profiler (MPP) is an open-source R Shiny application that streamlines downstream analysis of quantitative proteomic data. Compatible with grouped protein intensities tables from MaxQuant, FragPipe, Proteome Discoverer and other custom layouts, MPP provides an integrated platform for filtering, normalisation, imputation, differential expression analysis and cluster analysis across user-chosen experimental conditions. MPP supports both single- and dual-dataset comparisons, incorporates SAINTexpress for affinity purification and proximity labelling experiments, and downstream analysis of the significant protein list clusters to functional enrichment and interaction networks via Gene Ontology, BioGRID and STRING. Benchmarking with a KRAS proximity biotinylation dataset demonstrated the ability of MPP to identify reproducible clusters of differentially expressed proteins and reveal biologically meaningful patterns, including enrichment of solute carrier transporters and adhesion molecules. With interactive visualisations, customisable reports, and support for complex experimental designs, MPP offers a novel, versatile and user-friendly environment for proteomic data exploration and hypothesis generation.
Anwar, A. M.; Bayoumi, S.; Lahti, L.; Coffey, E.
Show abstract
Large-scale mass spectrometry (MS)-based proteomics, including single-cell proteomics, is routinely affected by technical variation arising from discrete batch effects, inter-laboratory differences and continuous signal drift during data acquisition. Current correction strategies typically address these sources of unwanted variation independently and often require either removal of proteins with missing values or imputation before correction, both of which may lead to information loss and potential amplification of technical bias. Here we present NMFBatch, a unified statistical framework that simultaneously models discrete and continuous unwanted variation in bulk and single-cell proteomics data. NMFBatch integrates non-negative matrix factorization with generalized additive modelling and directly accommodates missing values, thereby enabling both on-the-fly imputation during correction and optional post-correction imputation. Benchmarking against six batch-correction methods using multi-laboratory reference datasets and a large plasma proteomics cohort, shows that NMFBatch consistently reduces batch-associated variation while preserving biological structure under both balanced and confounded experimental designs. Application to single-cell proteomics data further showed effective reduction of TMT- and acquisition-associated variation while retaining biologically meaningful clustering. Together, these results establish NMFBatch as a flexible framework for modelling unwanted variation in proteomics experiments, with potential applications in cross-cohort harmonization and integrative proteomics analysis. Graphical AbstractCreated in BioRender. Youssef, A. (2026) https://BioRender.com/c1q1yxt O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=181 SRC="FIGDIR/small/726178v2_ufig1.gif" ALT="Figure 1"> View larger version (45K): org.highwire.dtl.DTLVardef@2b7cd1org.highwire.dtl.DTLVardef@10fada3org.highwire.dtl.DTLVardef@50e66corg.highwire.dtl.DTLVardef@147f81c_HPS_FORMAT_FIGEXP M_FIG C_FIG
Charkow, J.; Ghaznavi, M.; Seale, B.; Peng, J.; Gingras, A.-C.; Rost, H.
Show abstract
In low input mass spectrometry-based proteomics, Data Independent Acquisition (DIA), including diaPASEF, is quickly becoming the method of choice for label free quantification. Whether using empirical or in silico spectral libraries, performance is dependent on the library; however, the optimal library construction strategy for low input proteomics remains an open question. To address this, we examine and develop library construction approaches that are compatible with both spectrum-centric and peptide-centric analysis workflows. These approaches leverage a closely related, high-quality sample to improve library quality. First, we validated our approach in bulk sample amounts where we observed that the effects of gas-phase fractionation based library construction is dependent on the software framework, with improvements more pronounced in OpenSWATH compared to DIA-NN. In OpenSWATH, our peptide-centric library reconstruction workflow consistently outperforms a transfer learning strategy, an emerging alternative approach. In DIA-NN, trends are dependent on library source highlighting OpenSWATHs stronger dependence on the search space. In low-input applications, such as single-cell-equivalent injection amounts (100 pg) of HeLa cell digest on a timsTOF SCP, our library construction approach provided more pronounced improvements across both software tools compared to bulk samples. Using a peptide-centric reconstruction approach with the OpenSWATH analysis framework, we detected over 15,000 peptide precursors (2480 protein groups), a 90% improvement over the original library. Furthermore, using a spectrum-centric construction approach, peptide precursor identification rates improved over 6-fold ([~]1000 to [~]6000). Our strategy provides a practical solution for generating high-quality libraries in low-input applications.
Davison, C.; Locker, N.; Marques, M.; Kelly, S.; Relton, E.; Sharma, T.; Fraser, E.; Aragon Fernandez, P.; Schoof, E. M.; Petersen, M.; Pascoe, J.; Lilley, K. S.; Pinto, S. M.; Spick, M.; Bailey, M.
Show abstract
Many diseases arise from dysfunction within specific organelles or biomolecular condensates, highlighting the value of analysing proteins at subcellular resolution to uncover new biological mechanisms. We report a novel capillary-based subcellular sampling workflow coupled with liquid chromatography-mass spectrometry (LC-MS) for proteomic analysis of defined subcellular regions of individual cells. We applied this methodology to stress granules (SGs), membrane-less biomolecular condensates that form in response to cellular stress (including viral infection), and are implicated in infection, neuropathology and cancer. Comprehensive characterisation of SG protein composition remains limited by technical challenges associated with bulk purification, including loss of spatial context, dynamic behaviour and contamination from cytosolic material. Using our novel method, we identified a high-confidence set of 405 SG-associated proteins, including 46 established SG residents alongside numerous previously unreported candidates. Functional enrichment analysis revealed pathways consistent with known SG biology, while comparison with an independent cytosolic proteome dataset demonstrated minimal overlap, supporting the specificity of the sampling strategy. Selected novel SG protein candidates (AHNAK2, DDX39B, NUDT1 and FKBP2) were validated using immunofluorescence microscopy. These findings establish capillary-based subcellular sampling as a viable approach for proteomic analysis of SGs with preserved spatial context and provide a framework for analysing other subcellular compartments. Table of contentsWe report an LC-MS-based capillary sampling workflow for proteomic analysis of subcellular structures within single cells. This methodology identified 405 high-confidence stress granule-associated proteins, including 46 previously established and numerous novel candidates. The approach demonstrated high specificity and preserved spatial context, expanding the capabilities of subcellular proteomics. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=55 SRC="FIGDIR/small/724230v1_ufig1.gif" ALT="Figure 1"> View larger version (21K): org.highwire.dtl.DTLVardef@1fa0bb0org.highwire.dtl.DTLVardef@1158524org.highwire.dtl.DTLVardef@1d82812org.highwire.dtl.DTLVardef@2ee4d9_HPS_FORMAT_FIGEXP M_FIG C_FIG Figure made in Biorender.com.
O'Loughlin, J.; Moses, T.
Show abstract
Metabolomics offers a sophisticated analytical framework for characterising the molecular phenotype of biological organisms and complex living systems at a high resolution. As the functional endpoint of the omics cascade, the metabolome serves as a close reflection of cellular activity. It integrates genetic, transcriptomic and proteomic variations with external environmental influences. However, the inherent complexity of metabolomic datasets, characterised by high-dimensional chemical diversity, wide dynamic ranges, and significant matrix effects, necessitates a rigorous suite of chemometric and bioinformatic workflows. For researchers uninitiated in computational biology, the multi-stage requirement for raw data pre-processing, signal deconvolution, and multivariate statistical modelling (such as PCA or PLS-DA) presents a substantial barrier to entry. Navigating these convoluted data architectures remains a primary challenge in deriving biological meaning from the global metabolic profile. Here, we present a workflow to use Python Dash Apps to create a user-friendly interface for simplifying data processing and statistical calculations. Users can select their desired samples to initiate calculations for various statistical tests, generating interactive and publication-quality figures to explore their results. These apps were deployed on an Apache server via cPanel, allowing individuals to share their findings with collaborators and for research facilities to share metabolomics results with their users.
Kelly, M. I.; Thang, W. C. M.; Pang, C. N. I.; Gustafsson, O. J. R.; Ashwood, C.
Show abstract
Glycans are integral biomolecules whose presence cannot be predicted from genomic data alone, necessitating experimental characterisation through approaches including mass spectrometry. Assignment of glycan compositions to observed mass to charge ratios is computationally challenging due to the potential monosaccharide diversity and existing tools lack the required flexibility for integration into automated bioinformatic workflows. Here, we present GlyComboCLI, an open-source command-line application for the assignment of glycan compositions to mass spectrometry data which expands upon our previous GUI application, GlyCombo. GlyComboCLI accepts mass lists and vendor-neutral mzML files, supports an extensive range of monosaccharides, derivatisation states, reducing-end modifications, and adducts to ensure compatibility with a breadth of glycomics approaches. Outputs are compatible with downstream tools including Skyline and GlycoWorkBench. This software is deployable as a standalone executable, a Docker container, and a Galaxy tool, adhering to FAIR principles. When applied to 52 raw files from a published mouse glycomics dataset, a local instance completed composition assignment and downstream quality control in under three hours, recovering biologically consistent findings. Furthermore, an integrated Galaxy workflow demonstrated reproducible detection of sialidase treatment effects. GlyComboCLI substantially reduces the pool of spectra requiring manual structural interpretation, offering a flexible and scalable solution for glycomics bioinformatic workflows.
Wongtrakul-Kish, K.; Herbert, B. R.; Haynes, P. A.; Packer, N. H.
Show abstract
Adipogenesis is the process of adipose-derived stem cells (ADSCs) responding to extracellular signals from the stem cell niche to differentiate into adipocytes (fat cells) and may be studied in vitro using a cocktail of chemicals that promote adipogenic differentiation to produce differentiated ADSCs (dADSCs). The global membrane N- and O-glycosylation changes of this process have been previously analysed and compared to native adipocytes as a benchmark for a true adipocyte profile, and revealed that bisecting GlcNAc type N-glycans are characteristic of adipogenesis. As stem cell differentiation has been widely reported to result in cellular protein changes, the same cells (ADSCs, dADSCs and mature adipocytes) were characterised for their membrane proteome here using label-free quantitative shotgun proteomics analysis. The membrane proteome displayed more differences in protein numbers between the cell types compared to the previously reported N-glycome which had shown high identical glycomes between stem cells and in vitro dADSCs, suggesting that the proteome is more dynamic during in vitro adipogenesis. Following the global shotgun proteomics analysis, a more targeted approach of carrying out proteomic analysis of de-N-glycosylated peptides of gel-separated proteins unearthed new glycoproteins not detected in the shotgun proteomic analysis. This approach identified the adipogenic marker, CD36, to be under-represented in the shotgun proteome analysis, but as the dominant (glyco)protein in the adipocyte membrane proteome that was also up-regulated at the mRNA transcript level in both the in vitro differentiated ADSCs (7.1-fold increase) and mature adipocytes (102.9-fold increase). A comparison of CD36 sequence coverage in the global shotgun analysis with the de-N-glycosylated CD36 revealed a 41% increase when N-glycans were removed prior to trypsin digestion, explaining its observed increased abundance and highlights the crucial need for de-N-glycosylation of proteins in proteomics experiments for increased identification of glycoproteins. The systems glycobiology approach by the integration of previously reported glycomics data and the proteomics and transcriptomics analyses in this work extended the investigation of membrane protein glycosylation changes in adipose-derived stem cell differentiation. The work provides a framework for future glycoproteomics-based investigations into the differentiation of stem cells into adipocytes, and will allow their related pathologies and potential therapeutic applications to be discovered. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=121 SRC="FIGDIR/small/722121v1_ufig1.gif" ALT="Figure 1"> View larger version (44K): org.highwire.dtl.DTLVardef@189a786org.highwire.dtl.DTLVardef@5563b8org.highwire.dtl.DTLVardef@5cb5borg.highwire.dtl.DTLVardef@69e11f_HPS_FORMAT_FIGEXP M_FIG C_FIG
Moagi, M.; Beke, L.; Mehes, G.; Kecskemeti, G.; Szabo, Z.; Turiak, L.; Csosz, E.
Show abstract
Fresh-frozen tissues are considered the gold standard for proteomic analyses due to superior preservation of protein integrity; however, their use is limited by the logistical and financial requirements of long-term storage. Formaldehyde-fixed paraffin-embedded (FFPE) tissues provide a practical alternative owing to their stability and widespread availability in clinical settings. A critical step in FFPE proteomics is deparaffinization, which traditionally relies on organic solvents such as xylene, along with efficient reversal of formaldehyde-induced crosslinks. In this study, we evaluated multiple FFPE protein extraction and digestion workflows including chaotropic, surfactant-based, and detergent-free approaches in combination with xylene-free deparaffinization strategies, using label-free data-independent acquisition (DIA) LC-MS/MS. Among the tested methods, a chaotropic-, reductant-, and surfactant-free in-solution digestion workflow demonstrated robust protein and peptide recovery. A modified version of this protocol further improved peptide coverage while maintaining comparable protein depth. The applicability of the optimized workflow was assessed using FFPE needle biopsy samples from control, hepatic steatosis, and liver fibrosis groups. Distinct proteomic patterns were observed across conditions, with hepatic steatosis associated with early activation of stress-response pathways, while fibrosis showed evidence suggesting altered lipid metabolism. Overall, this study presents a simple, xylene-free, and MS-compatible workflow for FFPE proteomics that is suitable for low-input clinical samples and may support broader application of archival tissues in proteomic research.
Totsune, E.; Nakajima, D.; Konno, R.; Mikami-Saito, Y.; Arai-Ichinoi, N.; Nishida, H.; Yagi, H.; Ishige, T.; Suzuki, H.; Shirota, M.; Takayama, J.; Takano-Asai, C.; Shimura, M.; Sasai, H.; Lee, T.; Kido, J.; Nakajima, Y.; Kobayashi, H.; Kikuchi, A.; Numakura, C.; Hamazaki, T.; Oishi, K.; Nakamura, K.; Kawashima, Y.; Ohara, O.; Wada, Y.
Show abstract
Background: Citrin deficiency, caused by biallelic pathogenic variants in SLC25A13, must be identified early to prevent serious complications such as hyperammonemia and liver failure. However, clinical diagnosis is often delayed due to its nonspecific presentation and limited sensitivity of amino acid-based newborn screening methods. Although genome-based evaluations are being investigated to address these issues, concerns about their cost, turnaround time, variant interpretation ability, and data handling highlight the need for a more practical yet reliable alternative. We investigated the feasibility of applying proteomic approach on dried blood spots (DBS), which are routinely used in newborn screening. Methods: We performed untargeted liquid chromatography-tandem mass spectrometry to analyze the proteome of DBS using a previously developed "non-targeted analysis of non-specifically DBS-absorbed proteins" (NANDA) workflow. SLC25A13 protein abundance was quantified in individuals with biallelic loss-of-function mutations, compound loss-of-function/missense mutations, and heterozygous carriers; this was also evaluated in healthy and diseased controls representing relevant differential diagnoses. To leverage proteomic information, we derived a multivariate proteomic signature using feature selection and evaluated its performance with leave-one-out cross-validation. Biological relevance was assessed by enrichment analysis, and complementary transcriptomics was performed using RNA sequencing. Results: A total of 7,474 proteins, including SLC25A13, were consistently detected in DBS. SLC25A13 was undetectable in individuals with biallelic loss-of-function mutations. However, individuals with compound loss-of-function/missense genotypes showed reduced but measurable SLC25A13 levels, comparable to those observed in heterozygous carriers. In contrast, a compact 15-protein signature accurately identified individuals with compound loss-of-function/missense genotypes (AUC, 0.99; sensitivity, 1.00; specificity, 0.95). The signature was enriched for Ca2+-response, and transcriptomics showed downregulation of genes related to multimodal ion channels in affected individuals compared to controls. Conclusions: DBS-based proteomic profiling may assist in the diagnosis of citrin deficiency through SLC25A13-quantification and a biologically plausible multivariate signature. More broadly, this strategy offers a promising new diagnostic layer for protein disorders, providing a proteomic readout in a clinically practical DBS format with potential utility for future diagnostic and screening applications.
Berthias, F.; Bilgin, N.; Smyrnakis, A.; Le Boiteux, E.; Kosmopoulou, M.; Albers, C.; Suckau, D.; Mecinovic, J.; Papanastasiou, D.; Jensen, O. N.
Show abstract
Deep characterization of intact proteoforms remains an analytical challenge in functional proteomics, particularly for heterogenous multi-site post-translational modifications at distinct amino acid residues. Histones are among the most dynamically and diversely post-translationally modified proteins in eukaryote cells, carrying multiple, co-occurring and reversible modifications that can give rise to isomeric proteoform species. Tandem mass spectrometry with multimodal fragmentation capabilities is a promising approach for deep characterization of intact proteoforms, such as modified histones. We applied the novel timsOmni mass spectrometer, which incorporates the Omnitrap platform enabling multimodal MS workflows, for residue-level mapping of histone modifications, including acetylation and methylation. Recombinant histones H3.1 and H4 were in vitro acetylated by enzymes GCN5, PCAF and p300 to generate mono- and multi-acetylated proteoforms. Complementary MS2 electron- and collision-based dissociation (ECD, EID, RCID and ECciD), together with MS3 strategies, produced complete or near-complete backbone fragmentation of intact protein ions (>92% amino acid sequence coverage). For monoacetylated species generated by the more site-selective lysine acetyltransferases, the dominant proteoform matched the known catalytic preferences of the enzymes (H3.1K14ac for GCN5 and PCAF, and H4K8ac for PCAF), while minor positional isomers were also identified and their relative abundance estimated. In contrast, the broader substrate specificity of p300 produced a wide distribution of H4 proteoforms bearing up to seven acetylated lysine residues. Species carrying six and seven acetylations were characterized by multimodal MS2/MS3 experiments, enabling localization of individual acetylation sites and discrimination of positional isomers. Finally, endogenous histone proteoforms from liver extracts were analyzed, yielding sequence coverages of 92-93% for the most abundant species and enabling confident localization of multiple PTMs (acetylation and methylation). These results illustrate that multimodal MSn fragmentation of intact proteins supports residue-level assignment of combinatorial histone marks and coexisting positional isomers. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=165 HEIGHT=200 SRC="FIGDIR/small/722147v1_ufig1.gif" ALT="Figure 1"> View larger version (34K): org.highwire.dtl.DTLVardef@387ab5org.highwire.dtl.DTLVardef@2410org.highwire.dtl.DTLVardef@13fc392org.highwire.dtl.DTLVardef@140e054_HPS_FORMAT_FIGEXP M_FIG C_FIG HighlightsO_LIMultimodal MS{superscript 2}/MS3 maps histone PTMs on intact proteins. C_LIO_LIECD, EID, RCID, and ECciD provide complete or near-complete sequence coverage. C_LIO_LIMS3 localizes acetylation sites, distinguishes positional isomers. C_LIO_LIEndogenous H4 proteoforms are assigned with site-specific PTM mapping. C_LI
D'Oliviera, A.; Olson, S.; Bernhard, H.; Yu, Y.; Mugridge, J. S.
Show abstract
Transfer RNA methyltransferase 1 (TRMT1) installs N2-methylguanosine and N2,N2-dimethylguanosine modifications at position 26 of mammalian tRNAs, supporting tRNA structure, translation, and cellular response to redox stress. However, the local environment and interactome of TRMT1 in the cell is poorly defined. Here, we use APEX2-based proximity labeling of the N- and C-terminus of TRMT1, coupled with label-free quantitative proteomics to map candidate TRMT1-proximal proteins in HEK293T cells. Mass spectrometry data was acquired using both data-independent acquisition (DIA) and data-dependent acquisition (DDA) methods, and it was found that DIA substantially increased proximity proteome coverage, reproducibility, and the number of significantly enriched candidate hits compared to the DDA method. N- and C-terminal APEX2-TRMT1 constructs captured largely overlapping proteomes, suggesting the dual-labeling strategy provides a robust map of proximal proteins. Analysis of the significant TRMT1-proximal proteins reveals enrichment in RNA processing and ribonucleoprotein-associated factors, in addition to hits connected to tRNA modification, tRNA biogenesis, and redox-associated biology. These data provide a proteome-scale view of TRMT1-associated cellular proteins and environments, and lay the groundwork for future validation of functional TRMT1 interaction networks. SignificanceO_LIFusing APEX2 enzyme to both N-terminal and C-terminal of the bait enhanced the sensitivity for identification of protein interactions. C_LIO_LICombining APEX2-based endogenous labeling with DIA mass spectrometry increases reproducibility and depth of proximity proteome. C_LIO_LIThe study provides a rich source of potential interacting or proximally close proteins to TRMT1, which warrants further validation studies. C_LI
Jaber, N.; Di Somma, A.; Rodriguez-alfonso, A. A.; Cane, C.; Read, C.; Ständker, L.; Wiese, S.; Duilio, A.; Münch, J.; Spellerberg, B.
Show abstract
BackgroundRising antimicrobial resistance rates, require new therapeutic approaches such as antimicrobial peptides (AMPs), which are part of the innate immune defense, as alternatives to antibiotics. In this study, we aim to unravel the antibacterial activity of human histone H1.2 peptide against Pseudomonas aeruginosa and its potential immune modulatory role. MethodsWe used a hemofiltrate peptide database for antimicrobial peptide prediction to identify novel human AMPs. Thirteen sequences of histone H1 were identified as putative AMPs, synthesized, and tested against bacterial ESKAPE pathogens in a radial diffusion assay. SYTOX green assay, electrophoretic mobility shift assay, and differential proteomics assays were conducted to determine the mode of action of H1.2 peptide fragment. A crystal violet assay was performed to evaluate the inhibition of biofilm formation. The cytotoxicity of the peptide was tested in LDH and Alamar assays. Finally, to visualize the contributions of H1.2 in NETs formation, scanning electron microscopy was performed. ResultsThe H1.2 peptide inhibited the growth of P. aeruginosa in a dose and pH-dependent manner without cytotoxicity towards mammalian THP-1 cells. It acts on intracellular targets to inhibit the growth of P. aeruginosa. STRING analysis from the differential proteomics assay showed that H1.2 targets the downregulation of proteins involved in the biogenesis of outer membrane proteins, including the folding and trafficking of outer membrane proteins across the cytoplasmic membrane. Scanning electron microscopy images showed that H1.2 forms NET-like structures capable of trapping and immobilizing P. aeruginosa. ConclusionThe characterized antimicrobial activity of H1.2 points to a role for human histone H1 fragments in innate immunity and may represent a promising approach for the development of novel antibacterial therapies. Graphical Summary O_FIG O_LINKSMALLFIG WIDTH=192 HEIGHT=200 SRC="FIGDIR/small/724237v1_ufig1.gif" ALT="Figure 1"> View larger version (36K): org.highwire.dtl.DTLVardef@1778ddborg.highwire.dtl.DTLVardef@26430org.highwire.dtl.DTLVardef@ffbfa2org.highwire.dtl.DTLVardef@7e38ae_HPS_FORMAT_FIGEXP M_FIG C_FIG Sec transport and BAM complex system including chaperone proteins and quality control proteases are inhibited by H1.2 in Pseudomonas aeruginosa.Outer membrane proteins (OMPs) are synthesized in the cytoplasm and transported across the inner membrane via the Sec translocase, assisted by SecA/SecB or ribosomes. In the periplasm, they are escorted by chaperones such as SurA to the BAM complex for insertion into the outer membrane. Here, we show that H1.2, an antimicrobial peptide, targets membrane biogenesis in P. aeruginosa through downregulating Sec translocase (SecA/SecB and SecYEG), SurA, and BAM complex. Therefore, leading to improper transfer, folding and insertion of OMPs into the outer membrane. Normally, misfolded proteins are degraded by the protease MucD to prevent toxic aggregation in the bacteria. However, with H1.2 inhibiting MucD the proteotoxic stress is exacerbated, ultimately compromising bacterial homeostasis and viability. Figure created using BioRender.com.
Mun, H.; Leamy, M.; Kaushik, A.; Kieslich, C.; Douglas-Green, S. A.
Show abstract
When nanoparticles are exposed to biological fluids, they spontaneously adsorb proteins, forming a protein corona that defines their biological identity and dictates cellular uptake, biodistribution, and toxicity. Characterizing protein coronas includes using proteomics approaches (e.g., LC-MS/MS) to identify proteins and generate vast lists of adsorbed proteins, often visualized via complex heatmaps. While heatmaps display data they do not offer heuristic guide, leaving the driving mechanisms of adsorption unknown. Moreover, interpretation of protein corona proteomics data remains limited by fragmented workflows, inconsistent preprocessing, and visual outputs that are often descriptive rather than readily interpretable. These conventional methods identify adsorbed proteins but fail to explain why specific proteins are selected or how they influence the particles biological fate. Here, we developed ProCAST (Protein Corona Analysis and Statistical Tool), an R-based framework for protein corona proteomics that integrates proteomics data, nanoparticle metadata, protein annotations, and multi-level visualization within a single analytical workflow. ProCAST facilitates abundant protein clustering based on sample conditions, sequence descriptors, property or protein correlations, and gene ontology-based functional visualization. It also distinguishes abundant proteins from frequent proteins, providing distinct layers of information from the same dataset. ProCAST was used to re-analyze previously published PAMAM G4 dendrimer-FBS datasets, demonstrating that ProCAST reproduces descriptor-level visualizations and offers new insights through clearer comparisons of functional patterns and hypothesis generation from dominant corona proteins. By organizing results as complementary views of the same dataset, ProCAST facilitates the shift of protein corona analysis from descriptive outputs toward structured, comparative, and experimentally testable interpretations.
Myers, S. A.; Vasquez Castro, F.; Sanchez Solis, L. D.
Show abstract
MotivationPost-translational modifications (PTMs) are critical to protein function, yet the function of most known modification sites remains uncharacterized. CRISPR-mediated phenotypic screens using base editors offer a powerful approach to dissecting PTM function at scale. However, existing sgRNA design tools for base editing applications are DNA-centric and lack the throughput required to integrate seamlessly with mass-spectrometry-based proteomics experimental outputs. ResultsWe introduce protein editing in R, PrEditR, an open-source, protein-centric tool for high-throughput sgRNA design for custom base editor screens. PrEditR enables users to designate specific amino acid residues in proteins and design protospacer sequences to target the endogenous gene to install missense mutations via base editors. Availability and ImplementationPrEditR is available on GitHub and Docker Hub.
Karaman, I.; Payne, T.; Vizcaino, J. A.
Show abstract
Public data reuse is a key driver of progress in omics sciences, including increasingly metabolomics data. In this study, we present a validated analysis of confirmed reuse of datasets from the MetaboLights data repository, one of the leading resources in the field. Candidate publications were collected via dataset identifiers (MTBLS#) using a Python-based retrieval pipeline across major publisher databases. They were next manually validated to distinguish active reuse from citation-only mentions. Overall, 272 unique publications were confirmed to have reused at least one MetaboLights dataset. Reuse is dominated by Method/Tool Development, with smaller contributions from Secondary Biological Analysis and Data Integration/Meta-analysis. LC-MS datasets account for the majority of reuse, whereas NMR and GC-MS also contribute but at a lower level. Data reuse has increased over time, with a noticeable acceleration in the most recent years. At the dataset level, reuse follows a long-tail distribution, where a small subset of datasets accounts for repeated reuse, mainly as community benchmarks. These results provide a conservative estimate of public metabolomics data reuse and show that public datasets are predominantly used for methodological and computational applications. They also indicate that reuse is under-detected when dataset identifiers are not consistently reported, highlighting the need for standardised dataset citation to improve traceability and recognition of reuse. Statement of significance of the studyThe impact of public metabolomics repositories has been difficult to assess due to the lack of reliable evidence distinguishing true data reuse from simple literature citations. This study addresses that gap by providing a conservative, manually validated baseline for confirmed reuse of datasets from the MetaboLights data repository. The analysis clarifies how MetaboLights datasets are used in practice, showing that reuse is concentrated to a limited number of datasets and is dominated by computational and methodological applications.
Znabu, B. F.; Atif, Z.
Show abstract
Native mass spectrometry is a central analytical method for characterizing intact proteins, antibody-drug conjugates, and non-covalent assemblies, and it is increasingly the deciding measurement in biotherapeutic development pipelines. A single screening attempt requires days of expression, purification, and buffer exchange into ammonium acetate, followed by 30 to 60 minutes of optimization on a Q-Exactive UHMR or comparable instrument. To our knowledge, no published sequence-based predictor currently estimates native MS suitability before experimental screening. We curated 634 unique proteins with documented native MS outcomes, drawn from a 232-protein hand-curated base set, 358 entries recovered from RCSB PDB by full-text searching for native MS terminology, and 44 evidence-based extractions from supplementary tables across 80 EuropePMC papers. We trained four model variants on this benchmark: a 36-feature BioPython physicochemical baseline, an ESM-2 linear probe, an ESM-2 PCA-256 random forest, and a combined model that concatenates ESM-2 PCA components with BioPython features. All variants were evaluated under cluster-aware 5-fold cross-validation (GroupKFold over ESM-2 embedding-similarity clusters) with isotonic calibration, and standard stratified 5-fold cross-validation is reported as a sensitivity analysis. Under cluster-aware 5-fold cross-validation (GroupKFold over ESM-2 embedding-similarity clusters, our defense against homology leakage), the combined model achieved an AUC of 0.869 plus or minus 0.036, robust against the original stratified-CV value (0.873) and the BioPython baseline (0.852). The ESM-2-only variants showed AUC drops of 0.024 to 0.046 between stratified and cluster-aware splits, indicating that some of the apparent ESM-2 contribution under standard CV reflects homology leakage. Negative recall was 9.4 percent under cluster-aware splitting versus 26.0 percent under stratified, confirming that the models apparent failure-detection capability was substantially inflated by within-fold homology. We report both numbers and treat the cluster-aware values as the primary results. We release the curated dataset, the trained model, and an interactive web tool at nativeready.netlify.app. In its current form, NativeReady should be interpreted primarily as a positive-suitability triage tool; failure prediction remains limited by the scarcity of experimentally documented negative cases. We propose a user-contribution mechanism to accumulate real failure data over time. To our knowledge, no published sequence-based predictor currently estimates native MS suitability before experimental screening, and NativeReady is the first open benchmark and triage model specifically designed for this task.
Richards, D. M.; zhai, F.; Li, S.; Yu, Q.
Show abstract
Thermal proteome profiling (TPP) and its higher-throughput derivative, the proteome integral solubility alteration (PISA) assay, measure changes in protein thermal stability upon ligand binding or other perturbations and have been widely adopted in drug discovery and biomedical research. Though the PISA workflow is straightforward, key parameters, including detergent concentration, methods for removing denatured aggregates, and temperature range selection, vary across studies and can markedly influence assay outcomes. Yet these factors have not been systematically evaluated, limiting rational experimental design and data interpretation. Here, through a combined use of TPP, PISA, tandem mass tag (TMT)-based multiplexing, and computational simulation, we systematically characterize these parameters based on the melting behavior of [~]9,000 proteins. We find that reducing detergent concentration elevates apparent Tm by 1.5-2{degrees}C proteome-wide, and aggregate removal by filtration versus centrifugation further alters measurements. We leverage these observations to optimize PISA then apply the optimized conditions to identify the aminopeptidase NPEPPS as a previously uncharacterized binding partner of angiotensin II, a key vasoactive peptide hormone in blood pressure regulation. Together, this work provides a general framework for assay design and data interpretation, and extends the utility of PISA beyond small molecules to dissecting peptide-protein interactions, an increasingly important modality in drug discovery.